AITopics | lexical complexity

Collaborating Authors

lexical complexity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ChatGPT as Linguistic Equalizer? Quantifying LLM-Driven Lexical Shifts in Academic Writing

Lin, Dingkang, Zhao, Naixuan, Tian, Dan, Li, Jiang

arXiv.org Artificial IntelligenceApr-18-2025

The advent of ChatGPT has profoundly reshaped scientific research practices, particularly in academic writing, where non-native English-speakers (NNES) historically face linguistic barriers. This study investigates whether ChatGPT mitigates these barriers and fosters equity by analyzing lexical complexity shifts across 2.8 million articles from OpenAlex (2020-2024). Using the Measure of Textual Lexical Diversity (MTLD) to quantify vocabulary sophistication and a difference-in-differences (DID) design to identify causal effects, we demonstrate that ChatGPT significantly enhances lexical complexity in NNES-authored abstracts, even after controlling for article-level controls, authorship patterns, and venue norms. Notably, the impact is most pronounced in preprint papers, technology- and biology-related fields and lower-tier journals. These findings provide causal evidence that ChatGPT reduces linguistic disparities and promotes equity in global academia.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.12317

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report > Experimental Study (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Examining the Robustness of Large Language Models across Language Complexity

Zhang, Jiayi

arXiv.org Artificial IntelligenceJan-30-2025

With the advancement of large language models (LLMs), an increasing number of student models have leveraged LLMs to analyze textual artifacts generated by students to understand and evaluate their learning. These student models typically employ pre-trained LLMs to vectorize text inputs into embeddings and then use the embeddings to train models to detect the presence or absence of a construct of interest. However, how reliable and robust are these models at processing language with different levels of complexity? In the context of learning where students may have different language backgrounds with various levels of writing skills, it is critical to examine the robustness of such models to ensure that these models work equally well for text with varying levels of language complexity. Coincidentally, a few (but limited) research studies show that the use of language can indeed impact the performance of LLMs. As such, in the current study, we examined the robustness of several LLM-based student models that detect student self-regulated learning (SRL) in math problem-solving. Specifically, we compared how the performance of these models vary using texts with high and low lexical, syntactic, and semantic complexity measured by three linguistic measures.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.18738

Country: North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Technology (0.75)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Difficult for Whom? A Study of Japanese Lexical Complexity

Nohejl, Adam, Hayakawa, Akio, Ide, Yusuke, Watanabe, Taro

arXiv.org Artificial IntelligenceOct-24-2024

The tasks of lexical complexity prediction (LCP) and complex word identification (CWI) commonly presuppose that difficult to understand words are shared by the target population. Meanwhile, personalization methods have also been proposed to adapt models to individual needs. We verify that a recent Japanese LCP dataset is representative of its target population by partially replicating the annotation. By another reannotation we show that native Chinese speakers perceive the complexity differently due to Sino-Japanese vocabulary. To explore the possibilities of personalization, we compare competitive baselines trained on the group mean ratings and individual ratings in terms of performance for an individual. We show that the model trained on a group mean performs similarly to an individual model in the CWI task, while achieving good LCP performance for an individual is difficult. We also experiment with adapting a finetuned BERT model, which results only in marginal improvements across all settings.

annotator, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.18567

Country:

North America > Mexico > Mexico City > Mexico City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments

Carrasco-Farre, Carlos

arXiv.org Artificial IntelligenceApr-21-2024

Large Language Models (LLMs) are already as persuasive as humans. However, we know very little about how they do it. This paper investigates the persuasion strategies of LLMs, comparing them with human-generated arguments. Using a dataset of 1,251 participants in an experiment, we compare the persuasion strategies of LLM-generated and human-generated arguments through measures of cognitive effort (lexical and grammatical complexity) and moral-emotional language (sentiment and morality). Our results indicate that LLMs produce arguments that require higher cognitive effort, exhibiting more complex grammatical and lexical structures than human counterparts. Additionally, LLMs demonstrate a significant propensity to engage more deeply with moral language, utilizing both positive and negative moral foundations more frequently than humans. In contrast with previous research, no significant difference was found in the emotional content produced by LLMs and humans. The fact that we show that there is no equivalence in process despite equivalence in outcome, contributes to the emergent knowledge regarding AI and persuasion, highlighting the dual potential of LLMs to both enhance and undermine informational integrity through persuasion strategies.

argument, llm, persuasion, (16 more...)

arXiv.org Artificial Intelligence

2404.09329

Country:

North America > United States > New York (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.66)

Industry:

Government (0.93)
Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Estimating Lexical Complexity from Document-Level Distributions

Wold, Sondre, Mæhlum, Petter, Hove, Oddbjørn

arXiv.org Artificial IntelligenceApr-1-2024

Existing methods for complexity estimation are typically developed for entire documents. This limitation in scope makes them inapplicable for shorter pieces of text, such as health assessment tools. These typically consist of lists of independent sentences, all of which are too short for existing methods to apply. The choice of wording in these assessment tools is crucial, as both the cognitive capacity and the linguistic competency of the intended patient groups could vary substantially. As a first step towards creating better tools for supporting health practitioners, we develop a two-step approach for estimating lexical complexity that does not rely on any pre-annotated data. We implement our approach for the Norwegian language and verify its effectiveness using statistical testing and a qualitative evaluation of samples from real assessment tools. We also investigate the relationship between our complexity measure and certain features typically associated with complexity in the literature, such as word length, frequency, and the number of syllables.

complexity, complexity score, lix score, (16 more...)

arXiv.org Artificial Intelligence

2404.01196

Country:

North America > United States > Virginia (0.04)
North America > United States > Utah > Utah County > Provo (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Lexical Complexity Controlled Sentence Generation

Nie, Jinran, Yang, Liner, Chen, Yun, Kong, Cunliang, Zhu, Junhui, Yang, Erhong

arXiv.org Artificial IntelligenceNov-26-2022

Text generation rarely considers the control of lexical complexity, which limits its more comprehensive practical application. We introduce a novel task of lexical complexity controlled sentence generation, which aims at keywords to sentence generation with desired complexity levels. It has enormous potential in domains such as grade reading, language teaching and acquisition. The challenge of this task is to generate fluent sentences only using the words of given complexity levels. We propose a simple but effective approach for this task based on complexity embedding. Compared with potential solutions, our approach fuses the representations of the word complexity levels into the model to get better control of lexical complexity. And we demonstrate the feasibility of the approach for both training models from scratch and fine-tuning the pre-trained models. To facilitate the research, we develop two datasets in English and Chinese respectively, on which extensive experiments are conducted. Results show that our approach better controls lexical complexity and generates higher quality sentences than baseline methods.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.1454

Country:

Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Predicting Lexical Complexity in English Texts: The Complex 2.0 Dataset

Shardlow, Matthew, Evans, Richard, Zampieri, Marcos

arXiv.org Artificial IntelligenceNov-3-2022

Identifying words which may cause difficulty for a reader is an essential step in most lexical text simplification systems prior to lexical substitution and can also be used for assessing the readability of a text. This task is commonly referred to as Complex Word Identification (CWI) and is often modelled as a supervised classification problem. For training such systems, annotated datasets in which words and sometimes multi-word expressions are labelled regarding complexity are required. In this paper we analyze previous work carried out in this task and investigate the properties of CWI datasets for English. We develop a protocol for the annotation of lexical complexity and use this to annotate a new dataset, CompLex 2.0. We present experiments using both new and old datasets to investigate the nature of lexical complexity. We found that a Likert-scale annotation protocol provides an objective setting that is superior for identifying the complexity of words compared to a binary annotation protocol. We release a new dataset using our new protocol to promote the task of Lexical Complexity Prediction.

annotation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10579-022-09588-2

2102.08773

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Baltimore (0.14)
North America > United States > California > San Diego County > San Diego (0.05)
(22 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

Facebook's AI streamlines sentences while preserving meaning

#artificialintelligenceOct-17-2019, 19:42:09 GMT

Simplifying text's grammar and structure is a useful skill most of us acquire in school, but AI typically has a tougher go of it, owing to a lack of linguistic knowledge. That said, scientists at Facebook AI Research and Inria are progressing toward a simplification model dubbed ACCESS (AudienCe-CEntric Sentence Simplification), which they claim enables customization of text length, amount of paraphrasing, lexical complexity, syntactic complexity, and other parameters while preserving coherency. "Text simplification can be beneficial for people with cognitive disabilities, such as aphasia, dyslexia, and autism, but also for second language learners and people with low literacy," wrote the researchers in a preprint paper detailing their work. "The type of simplification needed for each of these audiences is different … Yet, research in text simplification has been mostly focused on developing models that generate a single generic simplification for a given source text with no possibility to adapt outputs for the needs of various target populations. To this end, the team tapped seq2seq, a general-purpose encoder-decoder framework that takes data and its context as inputs.

complexity, lexical complexity, simplification, (8 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Neurology (0.58)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Age of Exposure: A Model of Word Learning

Dascalu, Mihai (University Politehnica of Bucharest) | McNamara, Danielle S. (Arizona State University) | Crossley, Scott (Georgia State University) | Trausan-Matu, Stefan (University Politehnica of Bucharest)

AAAI ConferencesApr-19-2016

Textual complexity is widely used to assess the difficulty of reading materials and writing quality in student essays. At a lexical level, word complexity can represent a building block for creating a comprehensive model of lexical networks that adequately estimates learners’ understanding. In order to best capture how lexical associations are created between related concepts, we propose automated indices of word complexity based on Age of Exposure (AoE). AOE indices computationally model the lexical learning process as a function of a learner's experience with language. This study describes a proof of concept based on the on a large-scale learning corpus (i.e., TASA). The results indicate that AoE indices yield strong associations with human ratings of age of acquisition, word frequency, entropy, and human lexical response latencies providing evidence of convergent validity.

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country:

North America > United States > District of Columbia > Washington (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York > New York County > New York City (0.04)
(12 more...)

Industry: Education > Assessment & Standards (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback